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We address the problem of detecting non-stationary effects in time series (in particular 
fractal time series) by means of the Diffusion Entropy Method (DEM). This means that 
the experimental sequence under study, of size N, is explored with a window of size 
L « N . The DEM makes a wise use of the statistical information available and, 
consequently, in spite of the modest size of the window used, does succeed in revealing 
local statistical properties, and it shows how they change upon moving the windows along 
the experimental sequence. The method is expected to work also to predict catastrophic 
events before their occurrence. 



1 Introduction 

The main aim of this paper is to iUustrate a promising strategy to study non- 
stationary processes. We prove that the method is efficient by means of the joint 
study of real and artificial sequences, and we reach a conclusion that makes it plau- 
sible to imagine the method at work to successfully predict the time of occurrence 
of catastrophic events. 

The method here illustrated is a suitable extension of the DiffusiqiLEntropy 
Method (DEM). The DEM is discussed in details in other publicationllHla. Here 
we limit ourselves to give a concise illustration of this technique so as to allow the 
reader to understand the spirit of the method of this paper, at least through a 

ji first step of this 
. This means that 



first reading, without consulting these earlier publications, 
technique is the same as that of the pioneering work of Refs 
the experimental sequence is converted into a kind of Brownian-like trajectory. The 
second step aims at deriving many distinct diffusion trajectories with the technique 
of moving windows of size I. The reader should not confuse the mobile vindow of size 
I with the mobile window of value L that will be used later on in this paper to detect 
non-stationary properties. For this reason we shall refer to the mobile windows of 
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size L as large windows, even if the size of L is relatively small, whereas the mobile 
windows of size I will be called small windows. The large mobile window has to be 
interpreted as a sequence with statistical properties to reveal, and will be analyzed 
by means of small windows of size I, with I < L. The success of the method depends 
on the fact that the DE makes a wise use of the statistical information available. 
In fact, the small windows overlap among themselves and are obtained by locating 
their left border on the first site of the sequence, on the second second site, and 
so on. The adoption of overlapping windows ia^iiictated by the wish to establish 
a connection with the Kolomogorov-Sinai (KSfrQ method. This yields also, as a 
further beneficial effect, many more tcaiectories than the widely used method of 
Detrended Fluctuation Analysis (DFApH. 

In conclusion, we create a conveniently large number of trajectories by gradually 
moving the small window from the first position, with the left border of the small 
window coinciding with the first size of the sequence, to the last position, with the 
right border of the small window coinciding with the last site of the sequence. Atfter 
this stage, we utilize the resulting trajectories, all of them with the initial position 
located at a; = 0, to produce a probability distribution at "time" /. We evaluate the 
Shannon entropy of this distribution, Sd(l), and with easy mathematical arguments 
we prove that, if the diffusion process undergoes a scaling of intensity S, then 

Sd{l) = A + 5ln{l). (1) 

Thus the parameter S of the scaling condition, if this condition applies, can be 
measured without recourse to any form of detrendjng 

This is the original motivation for the DEMJnu. In this paper we want to prove 
that the DEM does much more than detecting scaling. In a stationary condition 
the DEM not only detects with accuracy the final scaling but it also affords a way 
to monitor the regime of transition to the final thermodynamic condition. If the 
sequence under study is affected by biases and non-stationary perturbations, the 
attainment of the final regime of steady scaling can be cancelled, and replaced by an 
out of equilibrium regime changing in time under the influence of time dependent 
biases. We want to prove that the DEM can be suitably extended to face this 
challenging non-stationary condition. 

The outline of the paper is as follows. In Section 2 we shortly review the main 
tenets of the DEM so as to make this paper, as earlier mentioned, as self-contained 
as possible. In Section 3 we illustrate the extension of the DEM and we discuss 
the fundamental problem of assessing which is the shortest portion, of size L, of 
the sequence under study, which is still large enough as to make the DEM work. 
In Section 3 we express our hope that this new method might serve prediction 
purposes. 

2 Diffusion Entropy 

Let us consider a sequence of M numbers , with i — 1, . . . , M. The purpose of 
the DEM algorithm is to establish the possible existence of a scaling, either normal 
or anomalous, in the most efficient way as possible, without altering the data with 
any form of detrending. Let us select first of all an integer number I, fitting the 
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condition 1 < I < M. As earlier mentioned, we shall refer ourselves to I as "time". 
For any given time I we can find M — I -\- 1 sub-sequences defined by 

d'^=e.+., 5 = 0,...,M-L (2) 

For any of these sub-sequences we build up a diffusion trajectory, labelled with the 
index s, defined by the position 

-^^^(0 = Eer = Ee.+.. (3) 

1=1 1=1 

Let us imagine this position as referring to a Brownian particle that at regular 
intervals of time has been jumping forward or backward according to the prescrip- 
tion of the corresponding sub-sequence of Eq.(||). This means that the particle 
before reaching the position that it holds at time I has been making I jumps. The 
jump made at the i-th step has the intensity and is forward or backward 

according to whether the number is positive or negative. 

We are now ready to evaluate the entropy of this diffusion process. To do that 
we have to partition the x-axis into cells of size e(l). When this partition is made 
we have to label the cells. We count how many particles are found in the same cell 
at a given time I. We denote this number by Niil). Then we use this number to 
determine the probability that a particle can be found in the z-th cell at time Z, 
Pi(l), by means of 

- (M^- 

At this stage the entropy of the diffusion process at the time / is determined and 
reads 

Sd{l) ^ -Y,P^{l)ln[v^{l)]. (5) 

i 

The easiest way to proceed with the choice of the cell size, e(/), is to assume it 
independent of / and determined by a suitable fraction of the square root of the 
variance of the fluctuation ^(i). 

Before proceeding with the illustration of how the DEM method works, it is 
worth making a comment on the way we use to define the trajectories. The method 
we are adopting is based on the idea of a moving window of size I that makes the 
s — th trajectory closely correlated to the next, the (s -I- 1) — th trajectory. The two 
trajectories have I — 1 values in common. A motivation for our choice is gisfcn by 
our wish to establish a connection with the Kolmogorov Sinai (KS) entropp'D. The 
KS entropy of a symbolic sequence is evaluated by moving a window of size I along 
the sequence. Any window position corresponds to a given combination of symbols, 
and, from the frequency of each combination, it is possible to derive the Shannon 
entropy S{1). The KS entropy is given by the asymptotic limit limi^ooS{l)/l. We 
believe that the same sequence, analyzed with the DEM method, at the large values 
of I where S{l)/l approaches the KS value, must yield a well defined scaling S. To 
realize this correspondence we carry out the determination of the Diffusion Entropy 
using the same criterion of overlapping windows as that behind the KS entropy. 
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Details on how to deal with the transition from the short-time regime, sensitive 
to the discrete nature of the process under study, to the long-timiiJjmit where both 
space an time can be perceived as continuous, are given in Ref. ua. Here we make 
the simplifying assumption of considering so large times as to make the continuous 
assumption valid. In this case the trajectories, built up with the above illustrated 
procedure, correspond to the following equation of motion: 

where ^(t) denotes the value that the time series under study gets at the I — th site. 
This means that the function ^(Z) is depicted as a function of t, thought of as a 
continuous time t = I. In this case the Shannon entropy reads 

/oo 
dxp{x,t)ln[p{x,t)]. (7) 
-oo 

We can derive with a simple treatment an analytical solution for Diffusion Entropy 
when the process is characterized by scaling, namely when 

= (8) 
Let us plug Eq.(||) into Eq.(0). After a simple algebra, we get: 

Sd{T)=A + 6{T)T, (9) 

where 

/oo 
dyF{y)ln[F{y)] (10) 
-OO 

and 

T^ln(t). (11) 

It is evident that this kind of technique to detect scaling does not imply any 
form of detrending, and this is one of the reasons why some attention should be 
devoted to it. It is also worth mentioning, as we prove now, that it yields the correct 
scaling values even for the so-called Levy walks, where the time dependence of the 
second moment with respect to time has an exponent which is different from the 
scaling exponent of the Levy process. 

We therefore check the efficiency of this technique by the studying the artificial 
sequence of Refs. This sequence is built up in such a way as to realize long 
sequences of either -|-'s or — 's. The probability of finding a sequence of only -|-'s or 
only — 's of length t is given by 

Here we focus our attention on the condition /i < 3 and we raise the reader's 
attention on the interval 2 <pU < 3. In fact, this kind of sequence is the same as 
that adopted in earlier work □ for a dynamic derivation of Levy diffusion, which 
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shows up when the condition 2 < ^ < 3 appUes. It corresponds to a particle 
travelling with constant velocity throughout the whole time interval corresponding 
to either only +'s or only -'s, and changing direction with no rest, at the end of any 
string with the same symbols. 

We will refer _to this model as Symmetric Velocity Model (SVM). We know from 
the theory of Refcl that the scaling of the resulting diffusion process when 2 < /i < 3 

Note, however, that this diffusion process has a finite propagation front, with bal- 
listic peaks showing up at both x = t and x = —t. The intensity of these peaks is 
proportional to the correlation function 

rp \ 1-1 — 2 



^dt)-[^,) . (14) 

As a consequence of this fact, the whole distribution does not have a single rescaling. 
In fact, the distribution enclosed between the two peaks rescales with 6 af Eq.jT^) 
while the peaks are associated to S = 1. Furthermore, it is well knowiH that the 
scaling of the second moment is given by 

Sh = (15) 

Thus, it is expected that the scaling detected by the DE method might not coincide 
with the prediction of Eq.([T^) for the whole period of time corresponding to the 
presence of peaks of significant intensity. We think that the Levy scaling of Eq. ( p^ ) 
will show up at long times, when the peak intensity is significantly reduced. This 
conjecture seems to be supported by the numerical results illustrated in Fig.|l|. We 



see in fact that the scaling predicted by Eq.(13) is reached after an extended tran- 
sient, of the order of about 20,000 in the scale of Fig.|l]. This time interval is about 
2000 larger than the value assigned to the parameter T, of Eq.(^2[), which is, in 
fact, in the case of Fig.|l|, T= 10. 

In conclusion, this section proves that the DE method applied to the SVM 
yields, for the scaling parameter S, the correct value of Eq.(p^), rather than the 
value that would be obtained measuring the variance of the diffusion process, 
Eq . ([isl) . However, the time necessary to make this correct value emerge is very 
large. Furthermore, as shown in Fig. 2, the adoption of SVM would make the 
scaling parameter 6 insensitive to fi in the whole interval 1 < /x < 2. This means 
that the adoption of the DE method would not allow us to distinguish a process 
with i-j, very close to 1 from one with fi veryijlose to 2. This problem can be solved 
using different rules for the diffusion processcl: the random walker can, for instance, 
walk always in the same direction, and at the "time" when there is a passage from 
a laminar region of 's to one of -'s and vice- versa. If this latter rule is adopted, then 
it is easy to prove |j that the resulting p{x,t) is an asymmetric Levy distribution, 
with a scaling S — ^ ~ 1 for 1 < /i < 2. 

Throughout this paper, however, apply the DEM with the rule corresponding 
to the symmetric velocity model, with 6 depending on as in Fig. 2. In the regime 
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Figure 1: The diffusion entropy as a function of time. The numerical method is applied to the 
artificial sequence described above, with /i = 2.5, studied according to the SVM prescription. 
According to the theoretical arguments of the text the scaling parameter S is the slope of the 
straight line fitting the numerical results at large times, which yields in this case S = 2/3 = 1) 
(T = 10). 



of ordinary statistical mechanics {fi ^ 3) the ordinary scahng is quickly attained, 
while the condition of anomalous statistical mechanics /i < 3 is characterized by a 
long transient regime, which is carefully recorded by the DEM. In this paper we 
want to use the DEM to monitor the time dependence of the "rules" generating the 
sequences under study. 

3 The new method at work with nonstationary sequences 

To illustrate the ideas that led us to propose the method of analysis with two moving 
window, let us begin with discussing the artificial sequence given by 

+ Acos{ujt). (16) 

The second term on the right hand side of this equation is a deterministic con- 
tribution that might mimic, for instance, the season periodicity of Ref. [1]. The 
first term on the right hand side is a fluctuation with no correlation that can be 
correlated or not to the harmonic bias. 

Fig. 3 refers to the case when the random fluctuation has no correlation with the 
harmonic bias. It is convenient to illustrate what happens when k = 0. This is the 
case where the signal is totally deterministic. It would be nicc^ if the entropy in this 
case did not increase upon increasing I. However, we must notice that the method 
of mobile windows implies that many trajectories are selected, the difference among 
them being, in the determinist case where £,b(t) = Acos{u}t), a difference on initial 
conditions. Entropy initially increases. This is due to the fact that the statistical 
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Figure 2: <5 as a function of /j according to the prescriptions of Ref. [3]. 



average on the initial conditions is perceived as a source of uncertainty. However, 
after a time of the order of the period of the deterministic process a regression to 
the condition of vanishing entropy occurs, and it keeps repeatedly occurring for the 
multiple times. Another very remarkable fact is that the maximum entropy value 
is constant, thereby signalling correctly that we are in the presence of a periodic 
signal, where the initial entropy increase, due to the uncertainty on the initial 
conditions, is balanced by the recurrences. Let us now consider the effect of a non 
vanishing k. We see that the presence of an even very weak random component 
makes an abrupt transition to occur from the condition where the diffusion entropy 
is bounded from above, to a new condition where the recurrences are limited from 
below by an entropy increase proportional to 0.5 In/. In the asymptotic time regime 
the DEM yields, as required, the proper scaling S = 0.5. However, we notice that it 
might be of some interest for a method of statistical analysis to give information on 
the extended regime of transition to the final thermodynamic condition. We notice 
that if the DEM is interpreted as a method of scaling detection, it might also give 
the impression that a scaling faster than the ballistic S is possible. This would be 
misleading. However, this aspect of the DEM, if conveniently used, can become an 
efficient method to monitor the non-stationary nature of the sequence under study. 

In the special case where the fluctuation ^(i) is correlated or anticorrelated to 
the bias, the numerical results illustrated in Fig. 4 show that the time evolution of 
the diffusion entropy is qualitatively similar to that of Fig. ^. The correlation be- 
tween the first and the second term on the right hand side of Eq. ([l^) is established 
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Figure 3: The diffusion entropy Sd{l) as a function of time I for different sequences of the type 
ofEq. (12). 

by assuming 

^(i) = Co(t)cosM), (17) 

where (t) is the genuine independent fluctuation, without memory, whose intensity 
is modulated to establish a correlation with the second term. It is of some interest 
to mention what happens when A = 0, k — I, and consequently ^b(f) coincides with 
^{t) of Eq. (pT[). In this case we get the straight (solid) line of Fig. ^. This means 
that the adoption of the assumption that the process is stationary yields a result 
that is independent of the modulation. 

We use this interesting case to illustrate the extension of the DEM, which is 
the main purpose of this paper. As earlier mentioned, this extension is based on 
the use of two mobiles windows, one of length L and the traditional ones of length 
I <^ L. This means that a large window of size L, with L <^T = 27r/cj, is located 
in a given position i of the sequence under study, with i < N ~ L, and the portion 
of the sequence contained within the window is thought of as being the sequence 
under study. We record the resulting 5 (obtained with a linear regression method) 
and then we plot it as a function of the position i. We show in Fig. ^ that this way 
of proceeding has the nice effect of making the periodic modulation emerge. 

Let us now improve the method to face non-stationary condition even further. 
As we have seen, the presence of time dependent condition tends to postpone or to 
cancel the attainment of a scaling condition. Therefore, let us renounce to using 
Eq. (9) and let us proceed as follows. For any large mobile window of size L let us 
call Imax the maximum size of the small windows. Let us call n the position of the 
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Figure 4: The diffusion entropy Sd{l) as a function of time I for different sequences of the type 
of Eq. (12) with the prescription of Eq. (13) for the random component. 




Figure 5; The method of the two mobile windows applied to a sequence given by Eq. (12) with 
A = and ^(t) given by Eq. (13). The dashed Une represents the sinus' amplitude (not in scale) 
corresponding to the position i of the left border of the large moving window. 
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Figure 6: The method of the two moving windows with Imax = 30 appUed to the analysis of 
an artificial CMM sequence with periodic parameter e. The period of the variation of e is 5000 
bps and the analysis is carried out with moving windows of size 2000 bps. Inset: Fourier spectral 
analysis of I{n). 



left border of the large window, and and let us evaluate the following property 

7(n)^'£Ml^Mil±MM. (18) 

1=2 ' 

The quantity I{n) detects the deviation from the condition of increase that the 
diffusion entropy would have in the random case. Since in the regime of transition 
the entropy increase can be much slower than in the corresponding random case, 
the quantity I(n) can also bear negative values. This indicator affords a satisfactory 
way to detect local properties. As an example, Fig. 6 shows a case based on the 
DNA model of Rcf. [8], called Copying Mistake Map (CMM). This is a sequence 
of symbols and 1 obtained from the joint action of two independent sequences, 
one equivalent to tossing a coin and the other equivalent to establishing randomly a 
sequence of patches whose length is distributed as an inverse power law with index 
fj, fitting the condition 2 < jj < 3. The probability of using the former sequence is 
1 — e and that of using the latter is e. We choose a time dependent value of e 

e = eo[l - cosicut)]. (19) 

In Fig. 6 we show how this periodicity is perceived by using the two-windows 
generalization, proposed in this paper, of the DEM. 

As a final example to show the efficiency of the new method of analysis, let 
us address the problem of the search of hidden periodicities in DNA sequences. 
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Figure 7: The method of two mobile windows applied to the analysis of the human DN A sequence. 
The method of two mobile windows {Imax = 20 L = 512) detects a periodicity of 990 bps. Inset: 
Fourier spectral analysis of I(n). 

Fig. 7 shows a distinct periodic behavior for the human T-cell receptor alpha/delta 

locus. A period of about 990 base pairs is very neat in the first part of the sequence 
(promoter region), while several periodicities of the order of 1000 base pairs are 
distributed along the whole sequence. 

These periodicities, probably due to DNA-protcins interactions in active eukary- 
otic genes, are expected by biologists, but the current technology is not yet adequate 
to deal with this issue, neither from the experimental nor from the computational 
point of view: such a behavior cannot be analyzed by means of crystallographic or 
structural NMR methods, nor would the current (or of the near future) computing 
facilities allow molecular dynamics studies of systems of the order of 10^ atoms or 
more. 

4 Conclusions 

The research work illustrated in this paper shows that the DEM is a very efficient 
way to detect the departure from ordinary Brownian motion with the shortest se- 
quence as possible. On the basis of these results we are confident that it will be 
possible to predict the occurrence of catastrophic events, heart-quakes, heart at- 
tacks, stock-market crashes, and so on. We think that if all these misfortune events 
are anticipated by a correlation change, lasting for a fairly extended time period, 
then the DEM, within the double window procedure here illustrat(^d. will signal 
in time their later occurrence. We refer to this method of analysis as Complex 



11 



Analysis of Sequences via Scaling AND Randomness Assessment (CASSANDRA), 
and we hope to prove by means of future research work that its prophetic power is 
worth of consideration. We wish that the CASSANDRA algorithm will have more 
fortune and will receive more credit than the daughter of Priam and Hecuba. 
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